Adding Risky Choices

Foundational Report 5

foundations
m_1

Extension of the SEU model to include risky choice data, enabling identification of utility parameters through known-probability decisions.

Author
Published

February 15, 2026

0.1 Introduction

Report 4 revealed differential parameter recovery in model m_0: while the sensitivity parameter α and feature weights β are recovered well, the utility increments δ show wider uncertainty and slower learning as sample size increases. This doesn’t mean δ is uninformed by the data—the posteriors do concentrate relative to the prior—but the information accumulates more slowly than for other parameters.

One approach to improving δ recovery is simply to collect more data. However, decision theory suggests an alternative: incorporating risky decisions where probabilities are known to the decision-maker. This approach, grounded in the Anscombe-Aumann framework, provides a different type of information that may help constrain utility parameters more directly.

0.2 Historical Background: Risk vs. Uncertainty

The distinction between decision-making under risk and under uncertainty is one of the oldest and most fundamental in decision theory.

0.2.1 Knight’s Distinction

Frank Knight (1921) drew a sharp distinction between:

  • Risk: Situations where probabilities of outcomes are objectively known (e.g., a fair die)
  • Uncertainty: Situations where probabilities are unknown or unknowable (e.g., will it rain tomorrow?)

Knight argued that these represent fundamentally different decision environments, with profit in business arising primarily from uncertainty rather than risk.

0.2.2 Von Neumann-Morgenstern: Expected Utility Under Risk

The modern theory of rational choice under risk was axiomatized by Neumann and Morgenstern (1944). Their key insight was that preferences over lotteries—probability distributions over outcomes—could be represented by a utility function if certain axioms (completeness, transitivity, continuity, independence) were satisfied.

For a lottery \(L = (p_1, o_1; p_2, o_2; \ldots; p_K, o_K)\) giving outcome \(o_k\) with known probability \(p_k\), the expected utility is: \[ U(L) = \sum_{k=1}^K p_k \cdot u(o_k) \]

The vNM framework applies only to risk—the probabilities \(p_k\) are taken as given.

0.2.3 Savage: Subjective Expected Utility Under Uncertainty

Leonard Savage (1954) extended expected utility theory to uncertainty by introducing subjective probabilities. In his framework, the decision-maker acts as if assigning personal probabilities to states of the world and maximizing expected utility with respect to these beliefs.

Savage’s axioms imply the existence of both:

  1. A utility function \(u\) over consequences
  2. A probability measure \(P\) over states

However, Savage’s approach requires eliciting utilities and probabilities simultaneously from choice behavior—raising the identification problem we encountered with m_0.

0.2.4 Anscombe-Aumann: The Horse Lottery Resolution

Anscombe and Aumann (1963) proposed an elegant solution that bridges risk and uncertainty. Their key innovation was to consider choice objects that combine both:

NoteThe Horse Lottery Framework

An Anscombe-Aumann act maps states of the world (the “horses”) to lotteries (probability distributions over prizes).

  • The probabilities in each lottery are objective (known)
  • The probabilities across states are subjective (uncertain)

By varying the lotteries while holding the state-dependence fixed, one can identify the utility function from risky preferences. By varying the state-dependence while holding lotteries fixed, one can identify subjective probabilities.

This insight is precisely what motivates model m_1. By observing choices in both:

  1. Uncertain decisions (like Savage’s acts, mapped through features to subjective probabilities)
  2. Risky decisions (like vNM lotteries, with known probabilities)

…we can separately identify utilities and subjective probabilities.

0.2.5 The Identification Logic

As Kreps (1988) explains in his exposition of the Anscombe-Aumann framework:

“The introduction of objective lotteries… provides a way to calibrate cardinal utility… Once cardinal utility is pinned down by choices among lotteries, subjective probability can be inferred from choices among acts.”

This is exactly our strategy:

Choice Type What It Reveals Model Component
Risky (known \(p\)) Utility function shape δ parameters
Uncertain (unknown \(p\)) Subjective probability formation β parameters
Both Choice sensitivity α parameter

0.3 Model m_1 Specification

Model m_1 extends m_0 by adding N risky choice problems alongside the M uncertain choice problems. Both share the same utility function (υ, determined by δ) and sensitivity parameter (α).

0.3.1 Data Structure

Show code
# From m_1.stan data block:

# === UNCERTAIN DECISIONS (as in m_0) ===
# M: number of uncertain decision problems
# K: number of possible consequences  
# D: dimensions of alternative features
# R: number of distinct uncertain alternatives
# w[R]: feature vectors for uncertain alternatives (vector[D] each)
# I[M,R]: indicator array for which alternatives appear in which problems
# y[M]: choices for uncertain problems

# === RISKY DECISIONS (new in m_1) ===
# N: number of risky decision problems
# S: number of distinct risky alternatives  
# x[S]: probability simplexes for risky alternatives (simplex[K] each)
# J[N,S]: indicator array for which risky alternatives appear in which problems
# z[N]: choices for risky problems

0.3.2 Key Differences from m_0

The crucial difference is in how probabilities enter:

Uncertain alternatives (same as m_0): \[ \psi_{rk} = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{w}_r)}{\sum_{k'=1}^K \exp(\boldsymbol{\beta}_{k'}^\top \mathbf{w}_r)} \] The probabilities ψ are derived from features via the learned mapping β.

Risky alternatives (new in m_1): \[ \pi_{sk} = x_{sk} \quad \text{(given as data)} \] The probabilities π are provided directly—they are the objective lottery probabilities.

0.3.3 Expected Utilities

For uncertain alternatives, expected utility is: \[ \eta^{(u)}_r = \sum_{k=1}^K \psi_{rk} \cdot \upsilon_k = \boldsymbol{\psi}_r^\top \boldsymbol{\upsilon} \]

For risky alternatives, expected utility is: \[ \eta^{(r)}_s = \sum_{k=1}^K \pi_{sk} \cdot \upsilon_k = \boldsymbol{\pi}_s^\top \boldsymbol{\upsilon} \]

The key insight: risky expected utilities depend only on υ (and hence δ), not on β. This breaks the confounding that plagued m_0.

0.3.4 Choice Probabilities

Both choice types use the same softmax rule with shared α:

\[ \chi^{(u)}_{mi} = \frac{\exp(\alpha \cdot \eta^{(u)}_{mi})}{\sum_{j=1}^{N^{(u)}_m} \exp(\alpha \cdot \eta^{(u)}_{mj})} \quad\text{and}\quad \chi^{(r)}_{ni} = \frac{\exp(\alpha \cdot \eta^{(r)}_{ni})}{\sum_{j=1}^{N^{(r)}_n} \exp(\alpha \cdot \eta^{(r)}_{nj})} \]

0.3.5 Likelihood

The log-likelihood is the sum of contributions from both choice types: \[ \log p(y, z | \theta) = \sum_{m=1}^M \log \chi^{(u)}_{m, y_m} + \sum_{n=1}^N \log \chi^{(r)}_{n, z_n} \]

0.4 Examining the Stan Implementation

Let’s examine key portions of m_1.stan:


// === PARAMETERS ===
parameters {
  real<lower=0> alpha;           // sensitivity (shared)
  matrix[K,D] beta;              // feature-to-probability mapping
  simplex[K-1] delta;            // utility increments (shared)
}

// === TRANSFORMED PARAMETERS ===
transformed parameters {
  // Shared utility function
  ordered[K] upsilon = cumulative_sum(append_row(0, delta));

  // UNCERTAIN: subjective probabilities via softmax
  for (i in 1:total_uncertain_alts) {
    psi[i] = softmax(beta * x_uncertain[i]);
  }

  // UNCERTAIN: expected utilities
  for (i in 1:total_uncertain_alts) {
    eta_uncertain[i] = dot_product(psi[i], upsilon);
  }

  // RISKY: expected utilities (no beta involved!)
  for (i in 1:total_risky_alts) {
    eta_risky[i] = dot_product(x_risky[i], upsilon);  // x_risky is known
  }

  // Choice probabilities via softmax with shared alpha
  // ... (for both uncertain and risky problems)
}

// === MODEL ===
model {
  // Priors (same as m_0)
  alpha ~ lognormal(0, 1);
  to_vector(beta) ~ std_normal();
  delta ~ dirichlet(rep_vector(1, K-1));

  // Likelihood: uncertain choices
  for (m in 1:M) {
    y[m] ~ categorical(chi_uncertain[m]);
  }

  // Likelihood: risky choices
  for (n in 1:N) {
    z[n] ~ categorical(chi_risky[n]);
  }
}

The critical line is eta_risky[i] = dot_product(x_risky[i], upsilon). Unlike uncertain alternatives where β mediates between features and probabilities, risky alternatives have their probabilities given directly. This means risky choices provide direct information about the utility vector υ.

0.5 Why Risky Choices Identify δ

Consider a concrete example. Suppose we have K=3 consequences with utilities \(\upsilon = (0, \upsilon_2, 1)\) where \(\upsilon_2 = \delta_1\) (since \(\delta_1 + \delta_2 = 1\) and utilities are constructed cumulatively).

Risky choice: Choose between:

  • Lottery A: (0.5, 0, 0.5) → outcomes 1 or 3 with equal probability
  • Lottery B: (0, 1, 0) → outcome 2 with certainty

Expected utilities: \[ \eta_A = 0.5 \cdot 0 + 0 \cdot \upsilon_2 + 0.5 \cdot 1 = 0.5 \] \[ \eta_B = 0 \cdot 0 + 1 \cdot \upsilon_2 + 0 \cdot 1 = \upsilon_2 \]

The decision-maker prefers the alternative with higher expected utility:

  • If they choose A, then \(\eta_A > \eta_B\), implying \(0.5 > \upsilon_2\)
  • If they choose B, then \(\eta_B > \eta_A\), implying \(\upsilon_2 > 0.5\)

This preference directly constrains υ₂ (and hence δ₁) without any confounding from subjective probabilities! More generally, by presenting lotteries that span the consequence space, we can triangulate the utility function with arbitrary precision (given sufficient data and lottery diversity).

0.6 Study Design for m_1

To demonstrate that adding risky choices resolves the δ identification problem, we need a controlled comparison between m_0 and m_1. Crucially, the uncertain decision problems must be identical between the two models—any differences in recovery should be attributable solely to the addition of risky choices.

We first create a base m_0 study design (matching Report 4’s configuration), then use the from_base_study() method to create an m_1 design that inherits the same uncertain alternatives and problems:

Show code
from utils.study_design import StudyDesign

# Configuration for both models (uncertain component)
config_base = {
    "M": 25,                    # Number of uncertain decision problems
    "K": 3,                     # Number of consequences
    "D": 5,                     # Feature dimensions
    "R": 15,                    # Distinct uncertain alternatives
    "min_alts_per_problem": 2,
    "max_alts_per_problem": 5,
    "feature_dist": "normal",
    "feature_params": {"loc": 0, "scale": 1},
}

# Risky problems configuration (added for m_1)
config_risky = {
    "N": 25,                    # Number of risky decision problems
    "S": 15,                    # Distinct risky alternatives
}

# Create the base m_0 study design
study_base = StudyDesign(
    M=config_base["M"],
    K=config_base["K"],
    D=config_base["D"],
    R=config_base["R"],
    min_alts_per_problem=config_base["min_alts_per_problem"],
    max_alts_per_problem=config_base["max_alts_per_problem"],
    feature_dist=config_base["feature_dist"],
    feature_params=config_base["feature_params"],
    design_name="m0_base_for_comparison"
)
study_base.generate()

print(f"Base Study Design (for m_0 and m_1 uncertain component):")
print(f"  M = {study_base.M} uncertain decision problems")
print(f"  R = {study_base.R} distinct alternatives")
print(f"  K = {study_base.K} consequences")
print(f"\nRisky Problems (added for m_1):")
print(f"  N = {config_risky['N']} risky decision problems")
print(f"  S = {config_risky['S']} distinct lotteries")
Base Study Design (for m_0 and m_1 uncertain component):
  M = 25 uncertain decision problems
  R = 15 distinct alternatives
  K = 3 consequences

Risky Problems (added for m_1):
  N = 25 risky decision problems
  S = 15 distinct lotteries

0.7 Parameter Recovery Comparison

In Report 4, we found that δ recovery was weaker than α and β recovery in model m_0, with credible intervals that narrowed more slowly as sample size increased. We hypothesized that adding risky choices might help constrain utility parameters more directly.

Now we test this hypothesis. If the Anscombe-Aumann logic holds, risky choices should provide more direct information about the utility function, potentially improving δ recovery.

0.7.1 Study Design for m_1

We create the m_1 study design using from_base_study(), which ensures that the uncertain component (alternatives and problems) is identical to the base m_0 study. Only the risky problems are newly generated. This gives us 50 total decision problems—matching the sample size used in the M=50 analysis of m_0 (Report 4):

Show code
from utils.study_design_m1 import StudyDesignM1
from analysis.parameter_recovery import ParameterRecovery
import tempfile

# Create the m_1 study design FROM the base study
# This ensures identical uncertain problems for valid comparison
study_m1 = StudyDesignM1.from_base_study(
    base_study=study_base,
    N=config_risky["N"],        # 25 risky problems
    S=config_risky["S"],        # 15 risky alternatives
    risky_probs="random",       # Random simplex probabilities (diverse lotteries)
    design_name="m1_parameter_recovery"
)
m_1 Study Design Generated:

  Uncertain Problems (inherited from base study):
    M = 25 problems
    R = 15 distinct alternatives
    Total uncertain choices: ~87

  Risky Problems (newly generated):
    N = 25 problems
    S = 15 distinct lotteries
    Total risky choices: ~88

  Shared:
    K = 3 consequences
    Total choices: ~175

0.7.2 Running Parameter Recovery for m_1

Show code
# Create output directory
output_dir_m1 = tempfile.mkdtemp(prefix="param_recovery_m1_")

# Initialize and run parameter recovery for m_1
recovery_m1 = ParameterRecovery(
    inference_model_path=os.path.join(project_root, "models", "m_1.stan"),
    sim_model_path=os.path.join(project_root, "models", "m_1_sim.stan"),
    study_design=study_m1,
    output_dir=output_dir_m1,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)

true_params_m1, posterior_summaries_m1 = recovery_m1.run()

m_1 Parameter Recovery Complete:
  Successful iterations: 50

0.7.3 Comparing m_0 and m_1 Recovery

Now we compare the key parameter recovery metrics between models. We focus particularly on δ, the parameter that was poorly identified in m_0:

Show code
# Compute recovery metrics for m_1

# Alpha
alpha_true_m1 = np.array([p['alpha'] for p in true_params_m1])
alpha_mean_m1 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m1])
alpha_lower_m1 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m1])
alpha_upper_m1 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m1])

alpha_bias_m1 = np.mean(alpha_mean_m1 - alpha_true_m1)
alpha_rmse_m1 = np.sqrt(np.mean((alpha_mean_m1 - alpha_true_m1)**2))
alpha_coverage_m1 = np.mean((alpha_true_m1 >= alpha_lower_m1) & (alpha_true_m1 <= alpha_upper_m1))
alpha_ci_width_m1 = np.mean(alpha_upper_m1 - alpha_lower_m1)

# Delta (using config from base study)
K_minus_1 = config_base['K'] - 1
delta_stats_m1 = []
for k in range(K_minus_1):
    param_name = f"delta[{k+1}]"
    
    delta_true = np.array([p['delta'][k] for p in true_params_m1])
    delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m1])
    delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m1])
    delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m1])
    
    delta_stats_m1.append({
        'parameter': f'δ_{k+1}',
        'true': delta_true,
        'mean': delta_mean,
        'lower': delta_lower,
        'upper': delta_upper,
        'bias': np.mean(delta_mean - delta_true),
        'rmse': np.sqrt(np.mean((delta_mean - delta_true)**2)),
        'coverage': np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper)),
        'ci_width': np.mean(delta_upper - delta_lower)
    })

delta_df_m1 = pd.DataFrame(delta_stats_m1)

# Beta (for comparison)
K, D = config_base['K'], config_base['D']
beta_stats_m1 = []
for k in range(K):
    for d in range(D):
        param_name = f"beta[{k+1},{d+1}]"
        beta_true = np.array([p['beta'][k][d] for p in true_params_m1])
        beta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m1])
        beta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m1])
        beta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m1])
        
        beta_stats_m1.append({
            'rmse': np.sqrt(np.mean((beta_mean - beta_true)**2)),
            'coverage': np.mean((beta_true >= beta_lower) & (beta_true <= beta_upper)),
            'ci_width': np.mean(beta_upper - beta_lower)
        })

beta_df_m1 = pd.DataFrame(beta_stats_m1)
Show code
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

for k in range(K_minus_1):
    # m_1 True vs Estimated (top row)
    ax = axes[0, k]
    ax.scatter(delta_stats_m1[k]['true'], delta_stats_m1[k]['mean'], 
               alpha=0.7, s=60, c='forestgreen', edgecolor='white', label='m_1')
    ax.plot([0, 1], [0, 1], 'r--', linewidth=2, label='Identity')
    ax.set_xlim(-0.05, 1.05)
    ax.set_ylim(-0.05, 1.05)
    ax.set_xlabel(f'True δ_{k+1}', fontsize=11)
    ax.set_ylabel(f'Estimated δ_{k+1}', fontsize=11)
    ax.set_title(f'm_1: δ_{k+1} Recovery (RMSE={delta_stats_m1[k]["rmse"]:.3f})', fontsize=11)
    ax.set_aspect('equal')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # m_1 Coverage (bottom row)
    ax = axes[1, k]
    iterations = np.arange(len(delta_stats_m1[k]['true']))
    for i in range(len(delta_stats_m1[k]['true'])):
        covered = ((delta_stats_m1[k]['true'][i] >= delta_stats_m1[k]['lower'][i]) & 
                   (delta_stats_m1[k]['true'][i] <= delta_stats_m1[k]['upper'][i]))
        color = 'forestgreen' if covered else 'crimson'
        ax.plot([i, i], [delta_stats_m1[k]['lower'][i], delta_stats_m1[k]['upper'][i]], 
                color=color, linewidth=2, alpha=0.7)
        ax.scatter(i, delta_stats_m1[k]['mean'][i], color=color, s=40, zorder=3)
    ax.scatter(iterations, delta_stats_m1[k]['true'], color='black', s=60, marker='x',
               label='True value', zorder=4, linewidth=2)
    ax.set_xlabel('Iteration', fontsize=11)
    ax.set_ylabel(f'δ_{k+1}', fontsize=11)
    ax.set_title(f'm_1: δ_{k+1} Coverage = {delta_stats_m1[k]["coverage"]:.0%}', fontsize=11)
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
Figure 1: Comparison of δ recovery between m_0 and m_1. Adding risky choices dramatically improves recovery: narrower credible intervals and better coverage.
Show code
# Run m_0 recovery on the SAME base study used for m_1's uncertain component
# This ensures a valid comparison: identical uncertain problems, differing only
# in whether risky choices are added
output_dir_m0 = tempfile.mkdtemp(prefix="param_recovery_m0_compare_")
recovery_m0_compare = ParameterRecovery(
    inference_model_path=None,  # Default m_0.stan
    sim_model_path=None,        # Default m_0_sim.stan
    study_design=study_base,    # Use the same base study as m_1 (M=25)
    output_dir=output_dir_m0,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)
true_params_m0, posterior_summaries_m0 = recovery_m0_compare.run()

# Run m_0 with M=50 using SAME alternatives (only new problems)
# This uses the SAME alternatives as study_base, just with 25 additional problems
study_m50 = study_base.extend(additional_M=25, design_name="m0_extended_m50")

output_dir_m0_m50 = tempfile.mkdtemp(prefix="param_recovery_m0_m50_")
recovery_m0_m50 = ParameterRecovery(
    inference_model_path=None,
    sim_model_path=None,
    study_design=study_m50,
    output_dir=output_dir_m0_m50,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)
true_params_m0_m50, posterior_summaries_m0_m50 = recovery_m0_m50.run()

# Run m_0 with M=50 using NEW alternatives (15 new feature vectors)
# This parallels how m_1 adds 15 new risky alternatives
study_m50_new_alts = study_base.extend_with_alternatives(
    additional_M=25,
    additional_R=15,  # Same as S in m_1
    design_name="m0_extended_m50_new_alts",
    new_problems_use_new_alts_only=True  # New problems only use new alternatives
)

output_dir_m0_m50_new = tempfile.mkdtemp(prefix="param_recovery_m0_m50_new_alts_")
recovery_m0_m50_new = ParameterRecovery(
    inference_model_path=None,
    sim_model_path=None,
    study_design=study_m50_new_alts,
    output_dir=output_dir_m0_m50_new,
    n_mcmc_samples=1000,
    n_mcmc_chains=4,
    n_iterations=50
)
true_params_m0_m50_new, posterior_summaries_m0_m50_new = recovery_m0_m50_new.run()

# Compute m_0 delta metrics for all three m_0 conditions
delta_stats_m0 = []
delta_stats_m0_m50 = []
delta_stats_m0_m50_new = []
for k in range(K_minus_1):
    param_name = f"delta[{k+1}]"
    
    # M=25 (base)
    delta_true = np.array([p['delta'][k] for p in true_params_m0])
    delta_mean = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0])
    delta_lower = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0])
    delta_upper = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0])
    
    delta_stats_m0.append({
        'parameter': f'δ_{k+1}',
        'rmse': np.sqrt(np.mean((delta_mean - delta_true)**2)),
        'coverage': np.mean((delta_true >= delta_lower) & (delta_true <= delta_upper)),
        'ci_width': np.mean(delta_upper - delta_lower)
    })
    
    # M=50 (same alternatives)
    delta_true_m50 = np.array([p['delta'][k] for p in true_params_m0_m50])
    delta_mean_m50 = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0_m50])
    delta_lower_m50 = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0_m50])
    delta_upper_m50 = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0_m50])
    
    delta_stats_m0_m50.append({
        'parameter': f'δ_{k+1}',
        'rmse': np.sqrt(np.mean((delta_mean_m50 - delta_true_m50)**2)),
        'coverage': np.mean((delta_true_m50 >= delta_lower_m50) & (delta_true_m50 <= delta_upper_m50)),
        'ci_width': np.mean(delta_upper_m50 - delta_lower_m50)
    })
    
    # M=50 (new alternatives)
    delta_true_m50_new = np.array([p['delta'][k] for p in true_params_m0_m50_new])
    delta_mean_m50_new = np.array([s.loc[param_name, 'Mean'] for s in posterior_summaries_m0_m50_new])
    delta_lower_m50_new = np.array([s.loc[param_name, '5%'] for s in posterior_summaries_m0_m50_new])
    delta_upper_m50_new = np.array([s.loc[param_name, '95%'] for s in posterior_summaries_m0_m50_new])
    
    delta_stats_m0_m50_new.append({
        'parameter': f'δ_{k+1}',
        'rmse': np.sqrt(np.mean((delta_mean_m50_new - delta_true_m50_new)**2)),
        'coverage': np.mean((delta_true_m50_new >= delta_lower_m50_new) & (delta_true_m50_new <= delta_upper_m50_new)),
        'ci_width': np.mean(delta_upper_m50_new - delta_lower_m50_new)
    })

# Also compute alpha metrics for all m_0 conditions
alpha_true_m0_m50 = np.array([p['alpha'] for p in true_params_m0_m50])
alpha_mean_m0_m50 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0_m50])
alpha_lower_m0_m50 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0_m50])
alpha_upper_m0_m50 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0_m50])
alpha_rmse_m0_m50 = np.sqrt(np.mean((alpha_mean_m0_m50 - alpha_true_m0_m50)**2))
alpha_coverage_m0_m50 = np.mean((alpha_true_m0_m50 >= alpha_lower_m0_m50) & (alpha_true_m0_m50 <= alpha_upper_m0_m50))
alpha_ci_width_m0_m50 = np.mean(alpha_upper_m0_m50 - alpha_lower_m0_m50)

alpha_true_m0_m50_new = np.array([p['alpha'] for p in true_params_m0_m50_new])
alpha_mean_m0_m50_new = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0_m50_new])
alpha_lower_m0_m50_new = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0_m50_new])
alpha_upper_m0_m50_new = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0_m50_new])
alpha_rmse_m0_m50_new = np.sqrt(np.mean((alpha_mean_m0_m50_new - alpha_true_m0_m50_new)**2))
alpha_coverage_m0_m50_new = np.mean((alpha_true_m0_m50_new >= alpha_lower_m0_m50_new) & (alpha_true_m0_m50_new <= alpha_upper_m0_m50_new))
alpha_ci_width_m0_m50_new = np.mean(alpha_upper_m0_m50_new - alpha_lower_m0_m50_new)
Figure 2
Show code
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

params = ['δ₁', 'δ₂']
x = np.arange(len(params))
width = 0.2

# RMSE comparison
ax = axes[0]
rmse_m0 = [delta_stats_m0[k]['rmse'] for k in range(K_minus_1)]
rmse_m0_m50 = [delta_stats_m0_m50[k]['rmse'] for k in range(K_minus_1)]
rmse_m0_m50_new = [delta_stats_m0_m50_new[k]['rmse'] for k in range(K_minus_1)]
rmse_m1 = [delta_stats_m1[k]['rmse'] for k in range(K_minus_1)]

bars1 = ax.bar(x - 1.5*width, rmse_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, rmse_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, rmse_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, rmse_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('RMSE', fontsize=12)
ax.set_title('δ RMSE by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')

# Coverage comparison
ax = axes[1]
cov_m0 = [delta_stats_m0[k]['coverage'] for k in range(K_minus_1)]
cov_m0_m50 = [delta_stats_m0_m50[k]['coverage'] for k in range(K_minus_1)]
cov_m0_m50_new = [delta_stats_m0_m50_new[k]['coverage'] for k in range(K_minus_1)]
cov_m1 = [delta_stats_m1[k]['coverage'] for k in range(K_minus_1)]

bars1 = ax.bar(x - 1.5*width, cov_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, cov_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, cov_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, cov_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.axhline(y=0.90, color='red', linestyle='--', linewidth=2, label='Target (90%)')
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('δ 90% CI Coverage by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.set_ylim(0, 1.05)
ax.legend(loc='lower right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')

# CI Width comparison
ax = axes[2]
ci_m0 = [delta_stats_m0[k]['ci_width'] for k in range(K_minus_1)]
ci_m0_m50 = [delta_stats_m0_m50[k]['ci_width'] for k in range(K_minus_1)]
ci_m0_m50_new = [delta_stats_m0_m50_new[k]['ci_width'] for k in range(K_minus_1)]
ci_m1 = [delta_stats_m1[k]['ci_width'] for k in range(K_minus_1)]

bars1 = ax.bar(x - 1.5*width, ci_m0, width, label='m_0 (M=25)', color='steelblue', alpha=0.7)
bars2 = ax.bar(x - 0.5*width, ci_m0_m50, width, label='m_0 (M=50 same)', color='coral', alpha=0.7)
bars3 = ax.bar(x + 0.5*width, ci_m0_m50_new, width, label='m_0 (M=50 new)', color='mediumseagreen', alpha=0.7)
bars4 = ax.bar(x + 1.5*width, ci_m1, width, label='m_1 (M=25+N=25)', color='purple', alpha=0.7)
ax.set_xlabel('Parameter', fontsize=12)
ax.set_ylabel('CI Width', fontsize=12)
ax.set_title('δ 90% CI Width by Condition', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(params)
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()
Figure 3: Comparison of δ recovery across four conditions: m_0 (M=25), m_0 (M=50 same alts), m_0 (M=50 new alts), and m_1 (M=25 + N=25 risky). The key comparison is between m_0 with new uncertain alternatives vs. m_1 with new risky alternatives.

0.7.4 Summary Statistics

Show code
# Build comparison table with all four conditions
comparison_rows = []

# Compute alpha metrics for m_0 M=25
alpha_true_m0 = np.array([p['alpha'] for p in true_params_m0])
alpha_mean_m0 = np.array([s.loc['alpha', 'Mean'] for s in posterior_summaries_m0])
alpha_lower_m0 = np.array([s.loc['alpha', '5%'] for s in posterior_summaries_m0])
alpha_upper_m0 = np.array([s.loc['alpha', '95%'] for s in posterior_summaries_m0])
alpha_rmse_m0 = np.sqrt(np.mean((alpha_mean_m0 - alpha_true_m0)**2))
alpha_coverage_m0 = np.mean((alpha_true_m0 >= alpha_lower_m0) & (alpha_true_m0 <= alpha_upper_m0))
alpha_ci_width_m0 = np.mean(alpha_upper_m0 - alpha_lower_m0)

# Alpha row
comparison_rows.append({
    'Parameter': 'α',
    'm_0 (M=25)': f'{alpha_rmse_m0:.4f}',
    'm_0 (M=50 same)': f'{alpha_rmse_m0_m50:.4f}',
    'm_0 (M=50 new)': f'{alpha_rmse_m0_m50_new:.4f}',
    'm_1': f'{alpha_rmse_m1:.4f}',
})

# Delta rows
for k in range(K_minus_1):
    comparison_rows.append({
        'Parameter': f'δ_{k+1}',
        'm_0 (M=25)': f'{delta_stats_m0[k]["rmse"]:.4f}',
        'm_0 (M=50 same)': f'{delta_stats_m0_m50[k]["rmse"]:.4f}',
        'm_0 (M=50 new)': f'{delta_stats_m0_m50_new[k]["rmse"]:.4f}',
        'm_1': f'{delta_stats_m1[k]["rmse"]:.4f}',
    })

comparison_df = pd.DataFrame(comparison_rows)
print("RMSE Comparison:")
print(comparison_df.to_string(index=False))
Table 1: Parameter recovery comparison across four conditions. The key comparison is m_0 with new uncertain alternatives vs. m_1 with new risky alternatives.
RMSE Comparison:
Parameter m_0 (M=25) m_0 (M=50 same) m_0 (M=50 new)    m_1
        α     1.1685          0.7617         0.7589 0.7329
      δ_1     0.3048          0.2946         0.2850 0.2722
      δ_2     0.3048          0.2946         0.2850 0.2722
Show code
# Build CI width comparison table
ci_rows = []

ci_rows.append({
    'Parameter': 'α',
    'm_0 (M=25)': f'{alpha_ci_width_m0:.3f}',
    'm_0 (M=50 same)': f'{alpha_ci_width_m0_m50:.3f}',
    'm_0 (M=50 new)': f'{alpha_ci_width_m0_m50_new:.3f}',
    'm_1 (M=25+N=25)': f'{alpha_ci_width_m1:.3f}'
})

for k in range(K_minus_1):
    ci_rows.append({
        'Parameter': f'δ_{k+1}',
        'm_0 (M=25)': f'{delta_stats_m0[k]["ci_width"]:.3f}',
        'm_0 (M=50 same)': f'{delta_stats_m0_m50[k]["ci_width"]:.3f}',
        'm_0 (M=50 new)': f'{delta_stats_m0_m50_new[k]["ci_width"]:.3f}',
        'm_1 (M=25+N=25)': f'{delta_stats_m1[k]["ci_width"]:.3f}'
    })

ci_df = pd.DataFrame(ci_rows)
print(ci_df.to_string(index=False))
Table 2: 90% credible interval widths across all four conditions.
Parameter m_0 (M=25) m_0 (M=50 same) m_0 (M=50 new) m_1 (M=25+N=25)
        α      3.252           2.550          2.673           2.694
      δ_1      0.892           0.876          0.879           0.837
      δ_2      0.892           0.876          0.879           0.837
TipKey Result: Comparing Uncertain vs. Risky Alternative Extensions

The four-way comparison now isolates what drives δ recovery improvement:

Condition Description What it tests
m_0 (M=25) Base study Baseline
m_0 (M=50 same) +25 problems, same R=15 alternatives More data, same feature space
m_0 (M=50 new) +25 problems using 15 new alternatives Expanded feature space
m_1 (M=25+N=25) +25 risky problems using 15 new lotteries Known probabilities

The crucial comparison is between conditions 3 and 4: - Both add 25 new problems - Both introduce 15 new choice objects (uncertain alternatives vs. risky lotteries) - The key difference: risky alternatives have known probabilities, uncertain alternatives have inferred probabilities

If m_1 outperforms m_0 (M=50 new), it demonstrates that the type of data matters—not just the amount or diversity. Risky choices provide qualitatively different information about utilities because they bypass the probability inference step entirely.

0.7.5 Why This Helps

The improvement from adding risky choices can be understood through the likelihood structure:

In m_0 (uncertain choices only): \[ \log p(y | \theta) = \sum_{m=1}^M \log \chi_{m, y_m} \quad \text{where} \quad \chi \propto \exp(\alpha \cdot \boldsymbol{\psi}^\top \boldsymbol{\upsilon}) \]

Here, ψ (from β) and υ (from δ) appear together in expected utilities. While not completely confounded, information about δ must be extracted indirectly.

In m_1 (uncertain + risky choices): \[ \log p(y, z | \theta) = \underbrace{\sum_{m=1}^M \log \chi^{(u)}_{m, y_m}}_{\text{Informs } \alpha, \beta, \delta} + \underbrace{\sum_{n=1}^N \log \chi^{(r)}_{n, z_n}}_{\text{More directly informs } \delta} \]

The risky choice likelihood depends on δ through υ = cumsum(δ), but not on β. This provides a more direct channel for learning about utilities, supplementing the indirect information from uncertain choices.

0.8 Summary

The extension from m_0 to m_1 follows a principled path grounded in decision theory:

  1. Differential recovery in m_0 (Report 4): Utility parameters show wider uncertainty than sensitivity parameters, though they do respond to data
  2. Three extensions compared: More problems (same alternatives), more problems (new alternatives), risky problems (new lotteries)
  3. The historical insight: Anscombe-Aumann recognized that combining risk and uncertainty provides complementary information about preferences
  4. The key comparison: m_0 with new uncertain alternatives vs. m_1 with new risky alternatives isolates whether known probabilities provide unique advantages
TipKey Insight

The four-way comparison reveals whether the benefit of risky choices comes from: - Simply having more diverse choice problems (testable via m_0 with new alternatives), or - The qualitatively different information provided by known probabilities (unique to m_1)

If m_1 outperforms the expanded m_0, it validates the Anscombe-Aumann insight: risky choices provide a more direct route to utility identification because they bypass the subjective probability inference step.

The choice between approaches depends on practical considerations: the feasibility of including risky choice tasks, the importance of precise utility estimation, and whether the research question requires separating utilities from beliefs.

0.9 Formal Identification Analysis

The empirical results above demonstrate that m_1 achieves better parameter recovery than m_0, particularly for the utility parameters δ. In this section, we provide a formal analysis of why this occurs by establishing identification results for both models.

0.9.1 Notation and Setup

Let the full parameter vector be \(\theta = (\alpha, \beta, \delta)\) where:

  • \(\alpha > 0\) is the sensitivity parameter
  • \(\beta \in \mathbb{R}^{K \times D}\) is the feature-to-probability mapping matrix
  • \(\delta \in \Delta^{K-2}\) is the utility increment simplex, i.e., \(\delta = (\delta_1, \ldots, \delta_{K-1})\) with \(\delta_k > 0\) and \(\sum_{k=1}^{K-1} \delta_k = 1\)

The utility vector \(\upsilon \in \mathbb{R}^K\) is constructed as: \[ \upsilon_1 = 0, \quad \upsilon_k = \sum_{j=1}^{k-1} \delta_j \text{ for } k = 2, \ldots, K \]

This ensures \(\upsilon\) is ordered with \(0 = \upsilon_1 < \upsilon_2 < \cdots < \upsilon_K = 1\).

For an uncertain alternative with feature vector \(\mathbf{w} \in \mathbb{R}^D\), the subjective probability vector is: \[ \psi_k(\beta, \mathbf{w}) = \frac{\exp(\boldsymbol{\beta}_k^\top \mathbf{w})}{\sum_{j=1}^K \exp(\boldsymbol{\beta}_j^\top \mathbf{w})} \]

For a risky alternative with probability vector \(\boldsymbol{\pi} \in \Delta^{K-1}\), the probabilities are simply \(\boldsymbol{\pi}\) (given as data).

Expected utilities are: \[ \eta^{(u)}(\theta, \mathbf{w}) = \boldsymbol{\psi}(\beta, \mathbf{w})^\top \boldsymbol{\upsilon}(\delta), \quad \eta^{(r)}(\delta, \boldsymbol{\pi}) = \boldsymbol{\pi}^\top \boldsymbol{\upsilon}(\delta) \]

Choice probabilities in a problem with alternatives yielding expected utilities \(\eta_1, \ldots, \eta_n\) are: \[ \chi_i(\alpha, \eta) = \frac{\exp(\alpha \cdot \eta_i)}{\sum_{j=1}^n \exp(\alpha \cdot \eta_j)} \]

0.9.2 Definition of Identifiability

NoteGlobal Identifiability

Parameters \(\theta\) are globally identifiable from a class of decision problems \(\mathcal{P}\) if for any \(\theta \neq \theta'\), there exists at least one problem \(P \in \mathcal{P}\) such that the induced choice probability distributions differ: \[ \chi^P(\theta) \neq \chi^P(\theta') \]

This definition captures whether, in principle, sufficiently rich data can distinguish any two parameter configurations. It is a necessary condition for consistent estimation.

0.9.3 Identification in Model m_1

We first establish that m_1 achieves global identification under mild conditions.

ImportantTheorem (Global Identifiability of m_1)

Consider model m_1 with \(K \geq 2\) consequences and \(D \geq 1\) feature dimensions. Let the class of decision problems include:

  1. Risky problems with probability vectors from a set \(\mathcal{X} \subseteq \Delta^{K-1}\) that contains \(K\) affinely independent probability vectors
  2. Uncertain problems with feature vectors from a set \(\mathcal{W} \subseteq \mathbb{R}^D\) that spans \(\mathbb{R}^D\)

Then for any \(\theta = (\alpha, \beta, \delta) \neq \theta' = (\alpha', \beta', \delta')\) with \(\alpha, \alpha' > 0\), there exists a binary choice problem (with exactly two alternatives) where the choice probabilities under \(\theta\) and \(\theta'\) differ.

Before proving the theorem, we establish a key lemma.

NoteLemma (Affine Independence Implies Linear Independence for Probability Vectors)

Let \(\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)} \in \Delta^{K-1}\) be \(K\) affinely independent probability vectors. Then they are linearly independent in \(\mathbb{R}^K\).

Proof of Lemma. Recall that vectors \(\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)}\) are affinely dependent if there exist scalars \(c_1, \ldots, c_K\), not all zero, such that \(\sum_s c_s = 0\) and \(\sum_s c_s \boldsymbol{\pi}^{(s)} = \mathbf{0}\). Equivalently, they are affinely independent if no such non-trivial relation exists.

Now suppose \(\sum_{s=1}^K c_s \boldsymbol{\pi}^{(s)} = \mathbf{0}\) for some scalars \(c_1, \ldots, c_K\) (a linear dependence relation). Taking the inner product with \(\mathbf{1} = (1, \ldots, 1)^\top\) and using the fact that \(\mathbf{1}^\top \boldsymbol{\pi}^{(s)} = 1\) for all probability vectors, we obtain: \[ 0 = \mathbf{1}^\top \mathbf{0} = \mathbf{1}^\top \left( \sum_{s=1}^K c_s \boldsymbol{\pi}^{(s)} \right) = \sum_{s=1}^K c_s (\mathbf{1}^\top \boldsymbol{\pi}^{(s)}) = \sum_{s=1}^K c_s \]

Thus any linear dependence among probability vectors automatically satisfies \(\sum_s c_s = 0\). This means any linear dependence would also constitute an affine dependence. Since our vectors are affinely independent by assumption, no non-trivial linear dependence can exist. Therefore \(c_s = 0\) for all \(s\), establishing linear independence. \(\square\)

Proof of Theorem. We proceed by cases, showing that any difference in parameters can be detected by a binary choice problem.

Case 1: \(\delta \neq \delta'\) (utilities differ).

By assumption, \(\mathcal{X}\) contains \(K\) affinely independent probability vectors \(\boldsymbol{\pi}^{(1)}, \ldots, \boldsymbol{\pi}^{(K)}\). By the Lemma, these are linearly independent in \(\mathbb{R}^K\).

Let \(\Pi\) be the \(K \times K\) matrix with rows \((\boldsymbol{\pi}^{(s)})^\top\). Since these rows are linearly independent, \(\Pi\) is invertible. Consider the expected utilities under both parameter configurations: \[ \eta^{(r)}_s = (\boldsymbol{\pi}^{(s)})^\top \boldsymbol{\upsilon}(\delta), \quad (\eta')^{(r)}_s = (\boldsymbol{\pi}^{(s)})^\top \boldsymbol{\upsilon}(\delta') \]

In vector form: \(\boldsymbol{\eta} = \Pi \boldsymbol{\upsilon}(\delta)\) and \(\boldsymbol{\eta}' = \Pi \boldsymbol{\upsilon}(\delta')\). Since \(\Pi\) is invertible: \[ \boldsymbol{\eta} = \boldsymbol{\eta}' \iff \boldsymbol{\upsilon}(\delta) = \boldsymbol{\upsilon}(\delta') \]

Now, \(\delta \neq \delta'\) implies \(\boldsymbol{\upsilon}(\delta) \neq \boldsymbol{\upsilon}(\delta')\) because the map \(\delta \mapsto \boldsymbol{\upsilon}(\delta) = \text{cumsum}([0, \delta])\) is injective on \(\Delta^{K-2}\). Therefore, \(\boldsymbol{\eta} \neq \boldsymbol{\eta}'\), meaning at least one pair of expected utilities differs: \(\eta^{(r)}_s \neq (\eta')^{(r)}_s\) for some \(s\).

Construct a risky choice problem with alternatives \(\boldsymbol{\pi}^{(s)}\) and \(\boldsymbol{\pi}^{(t)}\) where the expected utility differences satisfy: \[ \eta^{(r)}_s - \eta^{(r)}_t \neq (\eta')^{(r)}_s - (\eta')^{(r)}_t \] (This is achievable since \(\boldsymbol{\eta} \neq \boldsymbol{\eta}'\).) The choice probability is: \[ \chi_1 = \frac{\exp(\alpha \cdot \eta^{(r)}_s)}{\exp(\alpha \cdot \eta^{(r)}_s) + \exp(\alpha \cdot \eta^{(r)}_t)} = \sigma\bigl(\alpha(\eta^{(r)}_s - \eta^{(r)}_t)\bigr) \] where \(\sigma\) is the logistic function. Since \(\sigma\) is strictly monotonic and the utility differences differ, we have \(\chi_1(\theta) \neq \chi_1(\theta')\) (regardless of whether \(\alpha = \alpha'\) or not).

Case 2: \(\delta = \delta'\) but \(\alpha \neq \alpha'\).

Consider any risky choice problem with two alternatives having distinct expected utilities \(\eta_1 \neq \eta_2\). The choice probability for alternative 1 is: \[ \chi_1 = \sigma\bigl(\alpha(\eta_1 - \eta_2)\bigr) \]

This is a strictly monotonic function of \(\alpha\) when \(\eta_1 \neq \eta_2\), so \(\chi_1(\alpha) \neq \chi_1(\alpha')\).

Case 3: \(\delta = \delta'\), \(\alpha = \alpha'\), but \(\beta \neq \beta'\).

Since \(\beta \neq \beta'\) as \(K \times D\) matrices, there exist indices \((k, d)\) such that \(\beta_{kd} \neq \beta'_{kd}\). Since \(\mathcal{W}\) spans \(\mathbb{R}^D\), we can find \(\mathbf{w} \in \mathcal{W}\) such that the difference \((\beta - \beta')\mathbf{w}\) is not a constant vector (i.e., not a scalar multiple of \(\mathbf{1}\)).

Consider the subjective probability vectors \(\boldsymbol{\psi}(\beta, \mathbf{w}) = \text{softmax}(\beta \mathbf{w})\) and \(\boldsymbol{\psi}(\beta', \mathbf{w}) = \text{softmax}(\beta' \mathbf{w})\). The softmax function has the property that \(\text{softmax}(\mathbf{v}) = \text{softmax}(\mathbf{v}')\) if and only if \(\mathbf{v} - \mathbf{v}' = c \cdot \mathbf{1}\) for some scalar \(c\). Since \(\beta \mathbf{w} - \beta' \mathbf{w}\) is not a constant vector, we have \(\boldsymbol{\psi}(\beta, \mathbf{w}) \neq \boldsymbol{\psi}(\beta', \mathbf{w})\).

Since \(\delta = \delta'\) implies \(\boldsymbol{\upsilon}(\delta) = \boldsymbol{\upsilon}(\delta')\), and \(\boldsymbol{\psi} \neq \boldsymbol{\psi}'\), the expected utilities differ: \[ \eta^{(u)}(\mathbf{w}) = \boldsymbol{\psi}^\top \boldsymbol{\upsilon} \neq (\boldsymbol{\psi}')^\top \boldsymbol{\upsilon} = (\eta')^{(u)}(\mathbf{w}) \] (The equality would require \(\boldsymbol{\psi} - \boldsymbol{\psi}'\) to be orthogonal to \(\boldsymbol{\upsilon}\), which fails except on a set of measure zero since \(\boldsymbol{\upsilon}\) has distinct ordered components.)

Constructing an uncertain choice problem with this alternative yields different choice probabilities.

Combining the cases: For any \(\theta \neq \theta'\), at least one of the three cases applies, guaranteeing the existence of a distinguishing decision problem. \(\square\)

0.9.4 Non-Identifiability in Model m_0

In contrast, model m_0 faces a more challenging identification situation because the risky choice channel is absent.

WarningProposition (Identification Structure of m_0)

In model m_0 (uncertain choices only):

  1. The sensitivity parameter \(\alpha\) is globally identified.
  2. The expected utility function \(\eta^{(u)}: \mathbb{R}^D \to [0,1]\) is globally identified.
  3. The individual components \(\beta\) and \(\delta\) are weakly identified through the composite mapping \((\beta, \delta) \mapsto \eta^{(u)}(\cdot)\). While generic point identification holds (distinct \((\beta, \delta)\) pairs produce distinct expected utility functions except on a set of measure zero), information about these parameters accumulates more slowly from data compared to m_1, resulting in wider posterior credible intervals.

Proof. We establish each claim.

Claim 1 (\(\alpha\) is identified): Consider any two uncertain alternatives with features \(\mathbf{w}_1, \mathbf{w}_2\) yielding distinct expected utilities \(\eta_1 \neq \eta_2\) (such alternatives exist generically). The choice probability is \(\chi_1 = \sigma(\alpha(\eta_1 - \eta_2))\) where \(\sigma\) is the logistic function. Since \(\sigma\) is strictly monotonic and \((\eta_1 - \eta_2) \neq 0\), different values of \(\alpha\) yield different choice probabilities.

Claim 2 (\(\eta^{(u)}(\cdot)\) is identified): Suppose two parameter configurations \((\alpha, \beta, \delta)\) and \((\alpha', \beta', \delta')\) yield the same expected utility function: \(\eta^{(u)}(\mathbf{w}; \beta, \delta) = \eta^{(u)}(\mathbf{w}; \beta', \delta')\) for all \(\mathbf{w}\). Then for any uncertain choice problem with alternatives having features \(\mathbf{w}_1, \ldots, \mathbf{w}_n\), the expected utilities are identical, and hence (given \(\alpha = \alpha'\) from Claim 1) the choice probabilities are identical. Conversely, if \(\eta^{(u)}(\mathbf{w}; \beta, \delta) \neq \eta^{(u)}(\mathbf{w}; \beta', \delta')\) for some \(\mathbf{w}\), we can construct a binary choice problem distinguishing the two configurations (by the same argument as in the m_1 proof).

Claim 3 (weak identification of components): We analyze the information structure for identifying \(\beta\) and \(\delta\) separately. Consider \(K = 3\) with utility vector \(\boldsymbol{\upsilon} = (0, \delta_1, 1)\). The expected utility is: \[ \eta^{(u)}(\mathbf{w}) = \psi_2(\beta, \mathbf{w}) \cdot \delta_1 + \psi_3(\beta, \mathbf{w}) \] where \(\psi_2 + \psi_3 = 1 - \psi_1\). The softmax probabilities \(\psi_k\) depend on \(\beta\) only through the differences \(\boldsymbol{\beta}_k - \boldsymbol{\beta}_1\) (the softmax is invariant to adding a constant to all logits). Thus \(\beta\) has a \((K-1) \times D = 2D\) dimensional “effective” parameter space for determining \(\psi\).

The expected utility function \(\eta^{(u)}: \mathbb{R}^D \to [0,1]\) is determined by how \(\psi_2\) and \(\psi_3\) vary with \(\mathbf{w}\), weighted by \(\delta_1\) and \(1\) respectively. The question is whether distinct \((\beta, \delta)\) pairs can produce identical \(\eta^{(u)}(\cdot)\) functions.

Generic point identification holds: For the equality \(\psi_2(\beta, \mathbf{w}) \cdot \delta_1 + \psi_3(\beta, \mathbf{w}) = \psi_2(\beta', \mathbf{w}) \cdot \delta'_1 + \psi_3(\beta', \mathbf{w})\) to hold for all \(\mathbf{w} \in \mathbb{R}^D\), we would need a precise functional relationship between \((\beta, \delta)\) and \((\beta', \delta')\) that holds across the entire feature space. This occurs only on a set of measure zero in the parameter space.

Weak identification in practice: While generic point identification holds, the rate at which information accumulates about \(\beta\) and \(\delta\) separately is slower than in m_1. In m_0, all information about \(\delta\) must be extracted through uncertain choices where \(\beta\) and \(\delta\) interact multiplicatively in the expected utility. This interaction means that uncertainty about \(\beta\) propagates to uncertainty about \(\delta\) and vice versa, resulting in slower posterior concentration. \(\square\)

Discussion. The proposition clarifies what m_0 can and cannot identify:

  • What is strongly identified: The sensitivity \(\alpha\) and the expected utility function \(\eta^{(u)}(\cdot)\) are fully determined by choice data. This means m_0 can predict choice behavior for any new uncertain alternative, given its features.

  • What is weakly identified: The decomposition of \(\eta^{(u)}\) into beliefs (\(\psi\), via \(\beta\)) and utilities (\(\upsilon\), via \(\delta\)) faces a more challenging inference problem. While distinct \((\beta, \delta)\) pairs generically produce distinct expected utility functions (point identification), the information geometry is unfavorable: \(\beta\) and \(\delta\) interact multiplicatively in the expected utility, so uncertainty in one parameter propagates to the other. In m_1, risky choices directly constrain \(\delta\) through the likelihood term \(\sum_n \log \chi^{(r)}_{n, z_n}\), which depends on \(\delta\) but not \(\beta\). This separation breaks the multiplicative coupling and allows faster learning.

  • Practical consequence: The weak identification of \(\delta\) in m_0 manifests as slower posterior concentration—wider credible intervals that shrink more slowly with sample size compared to m_1. This matches our empirical parameter recovery findings. Note that this is a rate phenomenon, not a fundamental non-identification: with sufficient data, m_0 posteriors do concentrate on the true values.

0.9.5 Implications for Experimental Design

The identification results have direct implications for experimental design:

Design Choice Effect on Identification
Include risky choices with diverse lotteries Enables direct identification of \(\delta\) (and hence \(\upsilon\))
Use spanning set of probability vectors Ensures full utility function is identified, not just projections
Use spanning set of feature vectors Ensures full \(\beta\) matrix is identified
Include both simple (degenerate) and mixed lotteries Provides both “anchoring” and “interpolation” information about utilities
NotePractical Guidance

For m_1, the spanning conditions are easily satisfied:

  1. Risky alternatives: Include the \(K\) “corner” lotteries \(\mathbf{e}_1, \ldots, \mathbf{e}_K\) (certain outcomes) plus several interior lotteries. The corners alone provide \(K\) linearly independent vectors.

  2. Uncertain alternatives: Use feature vectors that vary across all \(D\) dimensions. Random draws from a continuous distribution (as in our study design) satisfy this almost surely.

The key practical constraint is whether risky choice tasks are feasible and meaningful in the application domain.

0.9.6 Remarks on the Identification Results

The identification analysis clarifies why m_1 achieves better parameter recovery than m_0:

  1. Separation of constraints: In m_1, risky choices provide constraints on \((\alpha, \delta)\) that are independent of \(\beta\). This separation means that information about utilities accumulates directly from risky choice data, without needing to simultaneously infer the belief-formation process.

  2. The role of known probabilities: The key advantage of risky alternatives is not merely that they add more data, but that their probability vectors are given rather than inferred. This eliminates one layer of the inference problem.

  3. Connection to classical results: Our identification result for m_1 is related to—but distinct from—the Anscombe-Aumann representation theorem. That theorem establishes when preferences can be represented by subjective expected utility; our result establishes when the parameters of a specific SEU model can be recovered from choice data. The underlying intuition is similar: mixing objective and subjective probabilities provides complementary information.

0.10 References

Anscombe, Francis J., and Robert J. Aumann. 1963. “A Definition of Subjective Probability.” Annals of Mathematical Statistics 34 (1): 199–205.
Knight, Frank H. 1921. Risk, Uncertainty and Profit. Boston: Houghton Mifflin.
Kreps, David M. 1988. Notes on the Theory of Choice. Boulder, CO: Westview Press.
Neumann, John von, and Oskar Morgenstern. 1944. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press.
Savage, Leonard J. 1954. The Foundations of Statistics. New York: John Wiley & Sons.

Reuse

Citation

BibTeX citation:
@online{helzner2026,
  author = {Helzner, Jeff},
  title = {Adding {Risky} {Choices}},
  date = {2026-02-15},
  url = {https://jeffhelzner.github.io/seu-sensitivity/foundations/05_adding_risky_choices.html},
  langid = {en}
}
For attribution, please cite this work as:
Helzner, Jeff. 2026. “Adding Risky Choices.” SEU Sensitivity Project. February 15, 2026. https://jeffhelzner.github.io/seu-sensitivity/foundations/05_adding_risky_choices.html.